Learning representations through stochastic gradient descent in cross-validation error

نویسندگان

Richard S. Sutton

Vivek Veeriah

چکیده

Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Learning the representations directly from the incoming data stream reduces the human labour involved in designing a learning system. More importantly, this allows in scaling of a learning system for difficult tasks. In this paper, we introduce a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on the meta-gradient descent approach, that was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes. The final update equation introduces an additional memory parameter for each of these weights and generalizes the backprop update equation. From our experiments, we show that crossprop learns and reuses its feature representation while tackling new and unseen tasks whereas backprop relearns a new feature representation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gradient-based Hyperparameter Optimization through Reversible Learning

Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization...

متن کامل

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more andmore importance, thanks to their scalability. While various methods have been proposed to speed up theirconvergence, the model selection phase is often ignored. In fact, in theoretical works most of the timeassumptions are made, for example, on the prior knowledge of the norm of ...

متن کامل

Predictive State Smoothing (PRESS): Scalable non-parametric regression for high-dimensional data with variable selection

We introduce predictive state smoothing (PRESS), a novel semi-parametric regression technique for high-dimensional data using predictive state representations. PRESS is a fully probabilistic model for the optimal kernel smoothing matrix. We present efficient algorithms for the joint estimation of the state space as well as the non-linear mapping of observations to predictive states and as an al...

متن کامل

Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

متن کامل

Learning Similarity with Operator-valued Large-margin Classifiers

A method is introduced to learn and represent similarity with linear operators in kernel induced Hilbert spaces. Transferring error bounds for vector valued large-margin classifiers to the setting of Hilbert-Schmidt operators leads to dimension free bounds on a risk functional for linear representations and motivates a regularized objective functional. Minimization of this objective is effected...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1612.02879 شماره

صفحات -

تاریخ انتشار 2016

Learning representations through stochastic gradient descent in cross-validation error

نویسندگان

چکیده

منابع مشابه

Gradient-based Hyperparameter Optimization through Reversible Learning

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

Predictive State Smoothing (PRESS): Scalable non-parametric regression for high-dimensional data with variable selection

Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

Learning Similarity with Operator-valued Large-margin Classifiers

عنوان ژورنال:

اشتراک گذاری